SEA, 19 November 2018

What will I talk about?

  • How we have used R Markdown in our undergraduate courses
  • Reproducibility with R Markdown
  • And some things that we think are just cool in Rmd.
  • BIG Thank you to Project TIER and the Alfred P. Sloan Foundation
  • Currently R & R for J. Econ Ed… (fingers crossed)

Benefits of R Markdown

  • One-stop solution
  • Work is reproducible
  • Calibrated for collaboration
  • Free (p = 0)
  • Broad & used in industry & research

Costs and alternatives?

Learning Curve for R and R Studio:

  • Packages facilitate learning:
  • tidyverse, mosaic, stargazer
  • error detection
  • autocomplete
  • online educational courses
  • plays well with Python (reticulated Python), SQL, JSON and others

Costs and alternatives?

Stata and dyndoc (Stata 15)?

  • Way to clunky;
  • different code for different formats (word, pdf, html)
  • doesn't produce output in the file (.do)
  • Conclusion: not a reasonable alternative
  • Not yet a "Stata Markdown"

Python and Jupyter Notebooks

  • also free
  • not geared to statistical programming

How we have used R Markdown

  • Introductory econometrics
  • Environmental Economics
  • Behavioral Economics
  • Business analytics
  • Seminars, theses and special studies

  • Our Class sizes? 10-35
    • Elsewhere: 80-120 (Duke, UCLA, Smith SDS/CS)

R Markdown & Reproducibility

  • What happens in a traditional research report?
  • Are traditional research reports easily reproducible?
  • What gives us soup to nuts reproducibility?
  • Answer: R Markdown or Scripted LaTeX + STATA (StatTag?)
  • Works within the TIER framework too

Traditional Reports

Courtesy of Bray, 2016

Courtesy of Bray, 2016

The Good

  • familiar format, e.g. Word
  • easy learning curve

The Bad

  • tough for reproducibility
  • difficult to update
  • mistakes crop up
  • teams can't collaborate easily

The Ugly?

  • Word/GDocs = Ugly?

Raw Markdown

Raw Markdown

Raw Markdown

Knitted Markdown

Knitted Markdown

Knitted Markdown

Text Formatting

# Header 1

## Header 2

### Header 3

This is normal sized text used in the body of our work. 

For bullet points, we use dashes, e.g. 

- Intro to RStudio
- More content
  - a sub-point
- Back to the original level

Document Types

R Markdown can produce a variety of document types (other than the default html page):

  • pdf_document makes a PDF with LaTeX (.pdf)

  • word_document for Microsoft Word documents (.docx).

  • odt_document for OpenDocument Text documents (.odt).

  • rtf_document for Rich Text Format documents (.rtf)

And others.

Presentation Types

R Markdown can also be re-purposed to produce a presentation file (as with this presentation):

  • io_slides opens in your browser and interactive (.html)

  • slidy another browser based presentation format (.html)

  • beamer makes a PDF with LaTeX (.pdf)

Data work

Think about data analysis as falling into three loose categories:

  • management & wrangling
  • visualization & summary statistics
  • modeling & inference

All of this occurs in the code "chunk"

Code chunks

  • Create a chunk: Hit CMD + OPTION + I (MacOS) and CTL + ALT + I (Win)

  • Open a chunk: Or type out three backticks ``` folowed by {r}

  • Close a chunk: And then three more back ticks ``` on another line.

  • Options: Within the {r} you can specify options, like {eval = FALSE} if you don't want it to evaluate the code

  • Labels: Label the chunk, e.g. {r cars} labels the chunk "cars" in your ToC

Code Chunk: Example

```{r cars, echo = TRUE}
summary(cars)
```

The option echo = TRUE means that the code gets included in the rendered html.

summary(cars)
##      speed           dist       
##  Min.   : 4.0   Min.   :  2.00  
##  1st Qu.:12.0   1st Qu.: 26.00  
##  Median :15.0   Median : 36.00  
##  Mean   :15.4   Mean   : 42.98  
##  3rd Qu.:19.0   3rd Qu.: 56.00  
##  Max.   :25.0   Max.   :120.00

Slide with Plot (Reproduction of Sutter, 2009)

  • I excluded the displaying of code for the above

Dynamic Graphs

  • Same as FRED dynamic graphs (student think this is very cool)

Plotly Graphs

Alter and check some data

##   session subject  r1  r2  r3  r4  r5  r6  r7  r8  r9  treatment team
## 1       1       1   0   0   0   0  10  10   0   0   0 individual   NA
## 2       1       2   0   0  30  40  40   0   0   0  20 individual   NA
## 3       1       3  30  30   0   0   0  60  60  10   0 individual   NA
## 4       1       4  20   0 100   0   0  30  75 100 100 individual   NA
## 5       1       5 100 100 100 100 100 100 100 100 100 individual   NA
## 6       1       6 100 100 100 100 100 100 100   0   0 individual   NA
##           uniqid
## 1 1_individual_1
## 2 1_individual_2
## 3 1_individual_3
## 4 1_individual_4
## 5 1_individual_5
## 6 1_individual_6

Statistical Tests

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  value by treatment
## W = 52876, p-value = 3.838e-10
## alternative hypothesis: true location shift is not equal to 0

Regression output

## 
## Call:
## lm(formula = value ~ treatment, data = SutNarrow)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -61.370 -29.385  -0.542  38.630  60.615 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)          39.385      1.451  27.152  < 2e-16 ***
## treatmentmessage     21.985      1.994  11.028  < 2e-16 ***
## treatmentmixed       10.609      1.925   5.510 3.92e-08 ***
## treatmentpaycomm     10.886      2.144   5.077 4.09e-07 ***
## treatmentteamtreat   16.313      2.629   6.204 6.34e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 34.81 on 2713 degrees of freedom
## Multiple R-squared:  0.04473,    Adjusted R-squared:  0.04333 
## F-statistic: 31.76 on 4 and 2713 DF,  p-value: < 2.2e-16

Or a Panel Regression

## Oneway (time) effect Random Effect Model 
##    (Swamy-Arora's transformation)
## 
## Call:
## plm(formula = value ~ treatment, data = SutNarrow, effect = "time", 
##     model = "random", index = c("uniqid"))
## 
## Balanced Panel: n = 302, T = 9, N = 2718
## 
## Effects:
##                    var  std.dev share
## idiosyncratic 1197.631   34.607 0.987
## time            16.062    4.008 0.013
## theta: 0.555
## 
## Residuals:
##     Min.  1st Qu.   Median  3rd Qu.     Max. 
## -61.9325 -28.6639  -2.5889  33.7276  64.3014 
## 
## Coefficients:
##                    Estimate Std. Error t-value  Pr(>|t|)    
## (Intercept)         39.3854     1.9657 20.0367 < 2.2e-16 ***
## treatmentmessage    21.9850     1.9818 11.0936 < 2.2e-16 ***
## treatmentmixed      10.6093     1.9140  5.5430 3.259e-08 ***
## treatmentpaycomm    10.8862     2.1315  5.1072 3.497e-07 ***
## treatmentteamtreat  16.3130     2.6138  6.2412 5.022e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Total Sum of Squares:    3403100
## Residual Sum of Squares: 3249200
## R-Squared:      0.045244
## Adj. R-Squared: 0.043836
## F-statistic: 32.1409 on 4 and 2713 DF, p-value: < 2.22e-16

Even Fancy Regression Output

Dependent variable:
value
treatmentmessage 21.985***
(1.994)
treatmentmixed 10.609***
(1.925)
treatmentpaycomm 10.886***
(2.144)
treatmentteamtreat 16.313***
(2.629)
Constant 39.385***
(1.451)
Observations 2,718
R2 0.045

How we have used R Markdown: Michael

Michael O'Hara:

Senior thesis seminar with 9 students

  • Very little background
  • All stages in R Markdown
    • Data manipulation, visualization, analysis
    • Presentations
    • Final paper

How we have used R Markdown: Michael

Advantages:

  • To students
    • One environment for everything
  • To me
    • Full reproducibility of all 9 papers (one award)
    • Much easier to reproduce in one document
    • Very professional appearance (with template)

Cost

  • Somewhat higher startup cost in teaching them R and R Studio

How we have used R Markdown: Aaron

  • Create slides in an Econometrics course
  • Final project in senior research seminar
  • Create custom progress reports for students
  • Interactive shiny apps for micro students

How we have used R Markdown: Simon

  • Behavioral Economics (upper-level; 20-35 students)
  • Slides, notes and assignments & labs
  • Students do a reproducible research project
  • Reproduce the results of a published paper
  • Propose a new experimental design to test a new hypothesis

How we have used R Markdown: Tomas

  • Business Analytics

Math?

How about Bayes' Rule?

\[Pr(\mbox{Outcome} | \mbox{signal}) = \frac{\theta p}{\theta p - (1 - \theta)(1 - p)}\]

R Markdown uses \(\LaTeX\) for math and it immediately gets displayed in R Studio.

  • That is, \(\LaTeX\) without the challenges of learning the packages, tables, etc that makes learning \(\LaTeX\) so hard.
  • In-line equations are bracketed by single dollar signs $.
  • Off-set equations are bracketed by double dollar signs $$.

What else?

R Markdown and R Studio together have excellent capabilities.

  • R Studio can show you the output of the commands within the R Markdown file ("lab notebook")
  • R Studio has error-detection and debugging assistance for your code (unlike, e.g. STATA or aspects of Excel)
  • R Studio server can be hosted online and your students work with logins there

Lessons from experience

  • Students will only learn commands through graded assignments
  • Students can struggle with basic computing (working directory, file paths); but autocomplete helps
  • ``drag-and-dropitis''; initial inequality in baselines can be tough

  • Students have to adjust to get Basics Right

  • Difference between a script (.R file - like a .do file) and a markdown document/notebook (.Rmd) (medium adjustment)
  • Difference between the .Rmd and the exported file: pdf, html, etc (quick adjustment)

  • Students like (or are used to) WYSIWYG, which Rmd is not

  • Students are accustomed to MS Word & G docs which are WYSIWYG, but Rmd is not.
  • Installing packages

  • analogy: install apps (packages) to do different things on your phone (RStudio & R)
  • Chrome extensions

  • Server = REALLY GREAT!

Suggestions

  • Use templates of Rmarkdown files
  • easy for students to "fill in blanks"
  • Templates for projects & bibliography
  • Examples of math
  • Examples of tables
  • Use lab exercises liberally (please borrow ours!)

R Link Love?

Acknowledgments